-
Notifications
You must be signed in to change notification settings - Fork 3
Initial LMBuddy class for running jobs #84
Initial LMBuddy class for running jobs #84
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a duplicate of the other hf_config.yaml
file since it already has the quantization section specified.
) | ||
print("Logging artifact for model checkpoint...") | ||
artifact_loader.log_artifact(model_artifact) | ||
ckpt_path, artifact_config = None, None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a follow-up PR (https://mzai.atlassian.net/browse/RD2024-152), I would like to refactor a bit how we are generating artifacts and results in these methods.
The issue is that because the tracking field is optional, we repeatedly end up having to write two code branches, (1) for when we initialize a W&B run and create an artifact and (2) for when we run the job without tracking or artifacts. It's almost like we want something like maybe_initialize_wandb_run
that handles the optionality of the tracking service.
Looking now! Pulled the branch, reading the instructions for from ray.job_submission import JobSubmissionClient
from pathlib import Path
from lm_buddy import LMBuddy
from lm_buddy.jobs.configs import (
FinetuningJobConfig,
FinetuningRayConfig,
LMHarnessJobConfig,
LMHarnessEvaluationConfig,
)
from lm_buddy.integrations.huggingface import (
AutoModelConfig,
TextDatasetConfig,
TrainerConfig,
AdapterConfig,
)
from lm_buddy.integrations.wandb import WandbRunConfig
# Base model to finetune from HuggingFace
model_config = AutoModelConfig(load_from="distilgpt2")
# Text dataset for finetuning
dataset_config = TextDatasetConfig(
load_from="imdb",
split="train[:100]",
text_field="text",
)
# HuggingFace trainer arguments
trainer_config = TrainerConfig(
max_seq_length=256,
per_device_train_batch_size=8,
learning_rate=1e-4,
num_train_epochs=1,
logging_strategy="steps",
logging_steps=1,
save_strategy="epoch",
save_steps=1,
)
# LORA adapter settings
adapter_config = AdapterConfig(
peft_type="LORA",
task_type="CAUSAL_LM",
r=8,
lora_alpha=16,
lora_dropout=0.2,
)
# Define tracking for finetuning run
tracking_config = WandbRunConfig(
name="example-finetuning",
project="lm-buddy-examples", # Update to your project name
entity="mozilla-ai", # Update to your entity name
)
# Ray train settings
ray_config = FinetuningRayConfig(
use_gpu=False, # Change to True if GPUs are available on your machine
num_workers=2,
)
# Full finetuning config
finetuning_config = FinetuningJobConfig(
model=model_config,
dataset=dataset_config,
trainer=trainer_config,
adapter=adapter_config,
tracking=tracking_config,
ray=ray_config,
) |
Nothing is changing in how you specify the cluster information. The CLI of the package is not changed, so you can use the same commands as an entrypoint to a Ray job submission using their SDK. |
Tests and left some comments, unit tests pass and sample job works! |
Thanks! Im a bit side tracked atm but will address most all of them in the next few hours. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adressing! LGTM
What's changing
run_job
method in favor of a classLMBuddy
that has methods forfinetune
andevaluate
LoadableAssetPath
type and associated data structures to represent anyload_from
path for a HF asset. See inline comments for motivation for this change.Note that the CLI API is not changed by these internal changes, so you can still execute the package as a Ray entrypoint in the same manner as before.
How to test it
Related Jira Ticket
Additional notes for reviewers
In follow-up PRs into this dev branch, I would like to do the following: